Journal article

Heads-Join: Efficient Earth Mover's Distance Similarity Joins on Hadoop

J Huang, R Zhang, R Buyya, J Chen, Y Wu

IEEE Transactions on Parallel and Distributed Systems | IEEE COMPUTER SOC | Published : 2016

Abstract

The Earth Mover's Distance (EMD) similarity join has a number of important applications such as near duplicate image retrieval and distributed based pattern analysis. However, the computational cost of EMD is super cubic and consequently the EMD similarity join operation is prohibitive for datasets of even medium size. We propose to employ the Hadoop platform to speed up the operation. Simply porting the state-of-the-art metric distance similarity join algorithms to Hadoop results in inefficiency because they involve excessive distance computations and are vulnerable to skewed data distributions. We propose a novel framework, named Heads-Join, which transforms data into the space of EMD lowe..

View full abstract

University of Melbourne Researchers

Grants

Awarded by Australian Research Council


Funding Acknowledgements

This work is supported in part by the Australian Research Council (ARC) Discovery Project DP130104587, National High-Tech R&D (863) Program of China (2013AA01A213), Natural Science Foundation of China (61433008, 61373145, 61170210, U1435216), Chinese Special Project of Science and Technology (2013zx01039-002-002). Dr. Zhang and Dr. Buyya are supported by the ARC Future Fellowships Projects FT120100832 and FT120100545 respectively. Prof. Chen is supported by the Fundamental Research Funds for the Central Universities (Grant No. 2015ZZ029). Jian Chen is the corresponding author.